Query Segmentation and Resource Disambiguation Leveraging Background Knowledge
نویسندگان
چکیده
Accessing the wealth of structured data available on the Data Web is still a key challenge for lay users. Keyword search is the most convenient way for users to access information (e.g., from data repositories). In this paper we introduce a novel approach for determining the correct resources for user-supplied keyword queries based on a hidden Markov model. In our approach the user-supplied query is modeled as the observed data and the background knowledge is used for parameter estimation. Instead of learning parameter estimation from training data, we leverage the semantic relationships between data items for computing the parameter estimations. In order to maximize accuracy and usability, query segmentation and resource disambiguation are mutually tightly interwoven. First, an initial set of potential segmentations is obtained leveraging the underlying knowledge base; then the final correct set of segments is determined after the most likely resource mapping was computed using a scoring function. While linguistic methods like named entity, multi-word unit recognition and POS-tagging fail in the case of an incomplete sentences (e.g. for keyword-based queries), we will show that our statistical approach is robust with regard to query expression variance. Our experimental results when employing the hidden Markov model for resource identification in keyword queries reveal very promising results.
منابع مشابه
Preliminary Lexical Framework For English-Arabic Semantic Resource Construction
This paper describes preliminary work concerning the creation of a Framework to aid in lexical semantic resource construction. The Framework consists of 9 stages during which various lexical resources are collected, studied, and combined into a single combinatory lexical resource. To evaluate the general Framework it was applied to a small set of English and Arabic resources, automatically comb...
متن کاملKnowledge-based and vertical-driven information retrieval
The paper introduces the architecture and functionality of the knowledge-based information retrieval technology developed at Vertical Search Works. A large-scale language-independent ontology is used during indexing, query analysis, and document retrieval as part of a web-scale vertical search engine. Three specific areas are examined: the knowledge resource, its visualization and editing toolb...
متن کاملSENSEABLE SEARCH: Selective Query Disambiguation
We present a method for detecting and resolving lexical ambiguity in information retrieval queries. Leveraging existing word sense disambiguation tools, we define a measure of query term ambiguity based on the distribution of word senses in the relevant document set. If a query term is ambiguous, we allow the user to select the correct sense of the query term, in the style of Google’s spelling ...
متن کاملEntity Disambiguation with Linkless Knowledge Bases
Named Entity Disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a reference knowledge base (e.g. Wikipedia). Such disambiguation can help add semantics to plain text and distinguish homonymous entities. Previous research has tackled this problem by making use of two types of context-aware features derived f...
متن کاملEntity Recognition and Linking in Chinese Search Queries
Aiming at the task of Entity Recognition and Linking in Chinese Search Queries in NLP&CC 2015, this paper proposes the solutions to entity recognition, entity linking and entity disambiguation. Dictionary, online knowledge base and SWJTU Chinese word segmentation are used in entity recognition. Synonyms thesaurus, redirect of Wikipedia and the combination of improved PED (Pinyin Edit Distance) ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012